Spatial data is any type of data that directly or indirectly references a specific geographical area or location. These can be, for example, geographic features on the landscape or environmental properties of an area such as temperature or air quality.
Spatial data can be continuous or discrete just like regular data, and in both cases it can be represented as a vector or a raster. Vector data uses points and lines to represent spatial data, while raster data represents data in a grid of pixels where each pixel/cell represents a specific geographic location and the information therein. Raster data will be heavily influenced by the size of the pixels/cells, i.e. resolution.
Both vector and raster data are planar representations of the world, a 3-dimensional sphere, and as such are not perfect copies. Depending on how the planar representation is created it will distort more or less certain areas of the world. Therefore many representations exist. These are called projections, as the representations project the 3-dimensional spheric image into a planar, 2-dimensional image. Maps with different projections are not comparable and cannot be overlaid. Therefore, we need to make sure we always work on the same projection when using spatial data.
To locate points geographic features on a map, you also need a system of coordinates that define the x-y coordinates for every point on the map. The system of latitude and longitude is the most familiar. These coordinate systems can be used to extract area and distance information from maps. A combination of a projection and a set of coordinates is known as a Spatial Reference System (SRS). Your choice of SRS will depend on the source of your data and your spatial analysis needs. In general, there is a trade-off between the extent of an SRS and its accuracy for measurement. For example, the WGS84 system of latitude and longitude covers the entire earth, but it is not useful for measurement because degrees of longitude become closer together the closer you are to the poles. Systems based in UTM or State Plane coordinates are much more accurate for measuring distances and areas, but they are only useful for covering specific regions.
To describe SRS systems around the world, every system is given a specific “EPSG code.” For example, the familiar system of latitude / longitude has the EPSG code 4326. All GIS software, including spatial mapping functions in R, allow users to transform our spatial data between different SRSs using these EPSG codes.
In summary, every spatial data set comes with two primary considerations:
Packages used to read and manipulate data include the sf
package, which reads a shapefile as a spatial data frame, and the
terra package that reads the shapefiles as a Spatvector.
Previously there was also the raster package, but we will
try to avoid it as it has been deprecated. We have also included
ggplot2 and the ggmap package, which has
useful mapping functions on top of ggplot2.
library(dplyr)
library(tidyr)
library(sf)
library(terra)
library(ggplot2)
library(ggmap)
In this section we will read and manipulate vector data in R.
Features can be points (red) representing specific x,y locations, such as a trees or camera sites; polygons (white) representing areas, such as forests or residential areas; and lines (yellow/green and blue) representing continuous linear features, such as roads or rivers
Vector data reads as a data frame would. Each row is a feature and each column is an attribute, and it contains a geometry column where the xy coordinates for the shapes are stored. Plotting these data will plot the points or shapes in the map using the xy coordinates stored for each feature.
## Simple feature collection with 6 features and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -122.3607 ymin: 47.64339 xmax: -122.3607 ymax: 47.64339
## Geodetic CRS: WGS 84
## speciesname locationid date geometry
## 1 Procyon lotor SEWA_N01_DRP2 2019-07-03 05:02:21 POINT (-122.3607 47.64339)
## 2 Procyon lotor SEWA_N01_DRP2 2019-07-03 06:04:45 POINT (-122.3607 47.64339)
## 3 Procyon lotor SEWA_N01_DRP2 2019-07-03 06:12:10 POINT (-122.3607 47.64339)
## 4 Procyon lotor SEWA_N01_DRP2 2019-07-03 06:13:16 POINT (-122.3607 47.64339)
## 5 Procyon lotor SEWA_N01_DRP2 2019-07-05 03:55:53 POINT (-122.3607 47.64339)
## 6 Procyon lotor SEWA_N01_DRP2 2019-07-05 04:05:21 POINT (-122.3607 47.64339)
The output in the section above shows example vector data for points
representing captures of wildlife at camera traps. The top of the output
describes the data, including the reference system (WGS 84, which is the
standard for lat/lon data) and minimum and maximum extent of the data.
Each row has attribute data (species, a location ID, and the capture
date of that species). The final column stores the spatial data. You can
see that it is POINT data and the coordinates of each
point.
In this section, you will learn the following methods for working with point data
Point data can be obtained directly from a shapefile or a csv file
where each row is a feature. In this case study, we will work with
camera trap site data and the information collected at each site,
i.e. point. The camera trap sites here are located in Seattle, and have
captured coyote and raccoon presence and absence from the 2019 spring
season to the 2021 winter season. The data is stored as a data frame in
a csv called captures.csv. The next two lines read the data
and show the top few lines.
captures.table <- read.csv("data/captures.csv")
head(captures.table)
## speciesname locationid date latitude longitude
## 1 Procyon lotor SEWA_N01_DRP2 2019-07-03 05:02:21 47.64339 -122.3607
## 2 Procyon lotor SEWA_N01_DRP2 2019-07-03 06:04:45 47.64339 -122.3607
## 3 Procyon lotor SEWA_N01_DRP2 2019-07-03 06:12:10 47.64339 -122.3607
## 4 Procyon lotor SEWA_N01_DRP2 2019-07-03 06:13:16 47.64339 -122.3607
## 5 Procyon lotor SEWA_N01_DRP2 2019-07-05 03:55:53 47.64339 -122.3607
## 6 Procyon lotor SEWA_N01_DRP2 2019-07-05 04:05:21 47.64339 -122.3607
Although the data you imported has Latitude and Longitude columns, R
does not recognize it as “spatial” data. Those two columns could be any
numbers, and they just happen to have names that we recognize as
referring to geography. Our first job will be to save these data
as a “spatial features data frame.” This is a special type of
data frame that has a column in a format that any GIS software will
recognize as geographic. This will allow us to observe these points in
the map and extract environmental information based on their location.
We will use the st_as_sf() function from the
sf package, and we will specify the projection (crs).
This function has three arguments:
coords: the names of the columns that hold our
coordinates. Note that Longitude comes before latitude. This is because
the pattern is x,y, and Longitude is the Earth’s x-axis.crs: the Coordinate Reference System.How do we know which projection our data is in? Because the data are in lat/lon format, they are in the WGS 84 projection, and the EPSG code for this projection is 4326. With other sources of point data, you may need to ask your data provider for the CRS before working with it.
captures.spatial <- st_as_sf(captures.table,
coords = c("longitude","latitude"),
crs = 4326)
head(captures.spatial)
## Simple feature collection with 6 features and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -122.3607 ymin: 47.64339 xmax: -122.3607 ymax: 47.64339
## Geodetic CRS: WGS 84
## speciesname locationid date geometry
## 1 Procyon lotor SEWA_N01_DRP2 2019-07-03 05:02:21 POINT (-122.3607 47.64339)
## 2 Procyon lotor SEWA_N01_DRP2 2019-07-03 06:04:45 POINT (-122.3607 47.64339)
## 3 Procyon lotor SEWA_N01_DRP2 2019-07-03 06:12:10 POINT (-122.3607 47.64339)
## 4 Procyon lotor SEWA_N01_DRP2 2019-07-03 06:13:16 POINT (-122.3607 47.64339)
## 5 Procyon lotor SEWA_N01_DRP2 2019-07-05 03:55:53 POINT (-122.3607 47.64339)
## 6 Procyon lotor SEWA_N01_DRP2 2019-07-05 04:05:21 POINT (-122.3607 47.64339)
With a spatial data frame, you get the same data frame as before with two new things.
Geodetic CRS row that confirms
our data are in WGS84 format.geometry. This is the
column that has converted the Latitude and Longitude columns into
spatial data we can use in calculations.Let’s observe the spatial distribution of the points by plotting them
using the ggplot2 package. The geom_sf()
function will allow us to plot the spatial data frame object.
ggplot(captures.spatial) + geom_sf()
We will want to add a reference so we can easily distinguish between locations because for now there is no basemap in this plot. We can use the Google Maps API for this. This tutorial uses an example API key to show the map. If you want to run the code yourself, there are a couple preliminary steps you need to take.
Once you have a Google Maps API key, you can use it to start a new
mapping session with the register_google() function.
my_api <- 'ENTER YOUR KEY HERE'
register_google(key = my_api)
If you don’t want to mess around with the Google Maps API for this tutorial, skip ahead to Plotting coyote detections on a map.
Now we can get a map relevant to our region using the
get_map() function. This can be done both using a bounding
box with coordinate information if we want a specific study area, or
just the city’s name.
seattle <- get_map("seattle", source= "google", api_key = my_api)
ggmap(seattle)
That map looks nice, but it will unfortunately cut off the camera sites at the eastern end of the study area. If we use a bounding box that shows the whole study area, the code will look like this:
seattle <- get_map(location = c(left = -122.5, bottom = 47.35,
right = -121.9, top = 47.8),
source ="google", api_key = my_api)
ggmap(seattle)
Now we can plot our camera site locations on the Seattle map.
ggmap(seattle) +
geom_sf(data=captures.spatial, inherit.aes = FALSE)
Now lets plot on a map the coyotes captured at each the camera trap sites. We will filter the data based on species name, using the dplyr package, and count detections at each site. We will then plot using the function seen above, but setting point size based on the number of detections at each site.
coyotes <- filter(captures.spatial, speciesname == "Canis latrans") %>%
group_by(locationid) %>%
summarize(detections = n())
ggmap(seattle) +
geom_sf(data = coyotes, inherit.aes = FALSE, aes(size = detections)) +
ggtitle("Coyote detections") +
labs(size = "Detection frequency") +
scale_size_continuous(breaks=seq(100, 500, by=100))
If you aren’t using the Google Maps API, you can replace the second command in that code block with the following. It won’t have the basemap, but you will be able to see the relative number of detections in the plot.
ggplot() +
geom_sf(data = coyotes, inherit.aes = FALSE, aes(size = detections)) +
ggtitle("Coyote detections") +
labs(size = "Detection frequency") +
scale_size_continuous(breaks=seq(100, 500, by=100))
See if you can make the same map, but for raccoons. (Solution is hidden below or on the next page in the PDF.)
raccoons <- filter(captures.spatial, speciesname == "Procyon lotor") %>%
group_by(locationid) %>%
summarize(detections = n())
ggmap(seattle) +
geom_sf(data = raccoons, inherit.aes = FALSE, aes(size = detections)) +
ggtitle("Raccoon detections") +
labs(size = "Detection frequency") +
scale_size_continuous(breaks=seq(100, 500, by=100))
Now that you’ve made a simple map with point data, it’s time to move on to lines.
In this section, you will learn the following methods for working with line data
We will look into vector data in the form of lines using the TIGER database for Washington, composed of primary and secondary roads. The spatial object will be read in the same way as we did the points, but in this case we will load directly the shapefile containing the features, downloaded from here
The dataset contains 6 attributes (fields) for each feature. Because
the data are already in a spatial format, we use the
st_read() function from the sf package to
import the data.
roads <- st_read("maps/roads/tl_2019_53_prisecroads.shp")
## Reading layer `tl_2019_53_prisecroads' from data source
## `/Users/jordanma/work/research/analysis/sucp/uwin_workshop/spatial_analysis_workshop_UWIN/maps/roads/tl_2019_53_prisecroads.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 2912 features and 4 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -124.6384 ymin: 45.55945 xmax: -117.0359 ymax: 49.00241
## Geodetic CRS: NAD83
Let’s plot the dataset to see how it looks. We will only plot one of the attributes, otherwise it will plot one map for each attribute.
plot(roads[,1])
As you can see from the map, our road layer covers the entire state.
We would like to narrow it down to our study area. We have coordinates
for the boundaries of our study area from the previous section.
Unfortunately, the spatial data we are using is not in the right
projection to use latitude and longitude values to crop it, so we first
need to transform the data. This is an essential step
in nearly any spatial analysis, so you will become very familiar with
the st_transform() function.
roads.lat <- st_transform(roads,crs="EPSG:4326")
We crop to the same dimensions as the bounding box from the previous
section with st_crop(). The function arguments are slightly
different than the ggmap function (e.g., “xmin” instead of
“left”) but the numbers are the same.
roads.clip <- st_crop(roads.lat, xmin = -122.5, ymin = 47.35,
xmax = -121.9, ymax = 47.8)
Although the WGS84 CRS is widely useful, we are going to convert our data to a different CRS for the rest of the analysis. The CRS we will use is the Universal Transverse Mercator (UTM) for Zone 10, which covers a strip from the equator to the north pole, including the state of Washington. Its CRS code is 26910. It has two primary advantages over WGS84:
roads <- st_transform(roads.clip, crs="EPSG:26910")
Now that we have the roads data cropped and in units of meters, we
can estimate the length of these roads, which will come in handy when
estimating road density in a certain area. The st_length()
function from the sf package estimates the length of each
road.
roads$length <- st_length(roads)
head(roads)
## Simple feature collection with 6 features and 5 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: 548982.9 ymin: 5259053 xmax: 560846.4 ymax: 5291810
## Projected CRS: NAD83 / UTM zone 10N
## LINEARID FULLNAME RTTYP MTFCC geometry
## 26 1104485670344 Aurora Ave N M S1200 LINESTRING (549171 5283097,...
## 27 1104485654497 Aurora Ave N M S1200 LINESTRING (549171 5283097,...
## 28 110854236203 Bronson Way N M S1200 LINESTRING (560046.5 525905...
## 35 110854235195 Sunset Blvd N M S1200 LINESTRING (560691 5259783,...
## 40 1104485698473 Aurora Ave N M S1200 LINESTRING (548999 5291259,...
## 41 1104485670345 Aurora Ave N M S1200 LINESTRING (549179.3 528263...
## length
## 26 720.3650 [m]
## 27 719.3267 [m]
## 28 532.1332 [m]
## 35 250.7223 [m]
## 40 552.6963 [m]
## 41 458.0921 [m]
Notice that the unit [m] is printed in the column. We
can convert to numeric if we don’t want the unit directly in the column
using as.numeric()
roads$length <- as.numeric(roads$length)
You can estimate total length for each road type, or any other
attribute, with the aggregate() function.
road_lengths <- aggregate(length ~ RTTYP, data=roads, FUN="sum")
print(road_lengths)
## RTTYP length
## 1 I 280107.5
## 2 M 471699.1
## 3 S 512049.7
Sometimes it is useful to convert lines to polygons, for example when
we want a better representation of the area a linear feature occupies.
This might be good for connectivity analysis as road width might define
crossing probability or for considering impervious surface generated by
roads. For this we use the st_buffer() function and decide
a buffer size we will use for the linear feature expansion. Notice that
the “Geometry type” for our layer is now POLYGON, which is
the subject of the next section.
roads_p <- st_buffer(roads, dist=12) # buffer to 12 meters
plot(roads_p[,1])
head(roads_p)
## Simple feature collection with 6 features and 5 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 548970.9 ymin: 5259041 xmax: 560858.4 ymax: 5291822
## Projected CRS: NAD83 / UTM zone 10N
## LINEARID FULLNAME RTTYP MTFCC geometry
## 26 1104485670344 Aurora Ave N M S1200 POLYGON ((549149.6 5283118,...
## 27 1104485654497 Aurora Ave N M S1200 POLYGON ((549162.6 5283111,...
## 28 110854236203 Bronson Way N M S1200 POLYGON ((560068 5259090, 5...
## 35 110854235195 Sunset Blvd N M S1200 POLYGON ((560690.5 5259804,...
## 40 1104485698473 Aurora Ave N M S1200 POLYGON ((548991.6 5291277,...
## 41 1104485670345 Aurora Ave N M S1200 POLYGON ((549165.9 5282679,...
## length
## 26 720.3650
## 27 719.3267
## 28 532.1332
## 35 250.7223
## 40 552.6963
## 41 458.0921
In this section, you will learn the following methods for working with polygon data
Polygon data are data that delimit an area. The shape of this area might represent specific physical features, such as buildings, or it might delimit an area with similar characteristics, for example residential areas, parks, or forest.
We can read a polygon dataset, like points and lines, with either the
sf package or the terra package. The main
difference is only the speed at which certain processes happen, but most
functions are found in equivalent versions in both packages. For this
section, you will perform the same set of operations with both packages.
We have pre-pended sf:: to all functions from
sf and terra:: to all functions from
terra to make their source clear. This isn’t necessary for
using the function, but it will make which package you are using more
clear.
sfJust like with lines, we use st_read to import a
shapefile. We will import a layer of forest cover across the study area.
This layer was originally downloaded from Open Street Map and has been
filtered for forest layers and cropped to the study area extent.
forest_sf <- sf::st_read("maps/Seattle_forest.shp")
## Reading layer `Seattle_forest' from data source
## `/Users/jordanma/work/research/analysis/sucp/uwin_workshop/spatial_analysis_workshop_UWIN/maps/Seattle_forest.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 3388 features and 44 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -122.5 ymin: 47.35 xmax: -121.9 ymax: 47.80039
## Geodetic CRS: WGS 84
To view the data, use the plot function. To avoid making
one map for each feature, plot just the first attribute.
plot(forest_sf[,1])
Let’s see what CRS these data are plotted in. Look for the row with
"EPSG" in it.
sf::st_crs(forest_sf)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
Because this is in WGS 84 / EPSG:4326, we need to transform it to EPSG:26910.
forest_sf <- sf::st_transform(forest_sf,crs="EPSG:26910")
Finally, we’re going to calculate the area of each polygon in the
dataset, append that value to the data frame in a new column called
area, and calculate the total area of forest cover in the
extent of the study area. Because our data are in a projection in
meters, the values we obtain will be in m^2, so we will also convert the
area to km^2 by dividing by 1000000.
forest_sf$area <- as.numeric(sf::st_area(forest_sf) / 1000000)
sum(forest_sf$area)
## [1] 398.6176
terraNow you can perform the same sequence using the functions in
terra.
forest_terra <- terra::vect("maps/Seattle_forest.shp")
When we plot a Spatvector from terra, we don’t need to specify one attribute. It draws only one map regardless.
plot(forest_terra)
Re-project to EPSG 26910.
terra::crs(forest_terra)
## [1] "GEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4326]]"
forest_terra <- terra::project(forest_terra,"EPSG:26910")
Calculate the area of forest cover.
forest_terra$area <- as.numeric(terra::expanse(forest_terra) / 1000000)
sum(forest_sf$area)
## [1] 398.6176
In this section, you will learn the following methods for working with raster data
We will focus on three raster datasets, two numerical rasters: Normalized Vegetation density index (NDVI), downloaded directly from EarthExplorer, and the global human settlement building density layer GHSL. We will also use one categorical raster: a land use / land cover map from the 2008 USGS Land Cover Database, which categorizes the landscape into several categories for developed area and natural areas.
We have previously cropped these rasters in order to keep only Washington state and speed up the computational process.
To load the rasters we use the rast() function from the
terra package.
NDVI <-rast("maps/NDVI_WA.tif")
BUILT <- rast("maps/BUILT_WA.tif")
LULC <- rast("maps/LULC_WA.tif")
We will directly reproject the rasters to a common projection. Be patient: the land use layer takes a little while to re-project.
NDVI <- project(NDVI, "EPSG:26910")
BUILT <- project(BUILT, "EPSG:26910")
LULC <- project(LULC, "EPSG:26910")
## |---------|---------|---------|---------|=========================================
We will further crop the Washington state layer to the extent of our study area using one of the layers we created in the previous section as a cookie-cutter. Let’s use the forest layer we extracted from OSM read as a Spatvector.
NDVI_c <- crop(NDVI, forest_terra)
BUILT_c <- crop(BUILT, forest_terra)
LULC_c <- crop(LULC, forest_terra)
plot(NDVI_c)
plot(BUILT_c)
The first two rasters contain numerical data only – each cell
describes a quantity that is measurable on a continuous scale. Each
value in the land use data, on the other hand, corresponds to a specific
type of cover based on this
legend. Because our data also has the auxilliary file
LULC_WA.tif.aux.xml, we can automaticaly see the
translation between the numerical values in the raster and their
corresponding category names. If we didn’t have this file, there are
methods to classify raster data into categories, although that is beyond
the scope of this tutorial.
plot(LULC_c)
When we are dealing with many rasters, we can stack them together and apply functions directly to all of them, but first they have to perfectly match in terms of extent. We will first project one using another as a cookie-cutter and then stack. We will stack the numerical rasters only for now and rename them.
NDVI_c <- project(NDVI_c, BUILT_c)
#now we can stack them
stack <- c(NDVI_c, BUILT_c)
names(stack) <- c("NDVI","BUILT")
plot(stack)
With rasters we can also do math, the function will be applied to each cell against each overlaying cell. For example if we substract building density from vegetation density. This technique is useful for comparisons among similar rasters, for example comparing NDVI between two time periods.
stack_diff <- NDVI_c - BUILT_c
plot(stack_diff)
If we want to have more control over the map and add more layers, we
can use ggplot. To plot a raster with ggplot,
we first need to convert the raster to a data frame. The
xy=TRUE argument when we create the data frame saves the
coordinates of each pixel in the raster.
LULC_df <- as.data.frame(LULC_c, xy=TRUE)
names(LULC_df) <- c("x","y","LandCover")
ggplot(LULC_df) +
geom_raster(aes(x = x, y = y, fill = LandCover))
Yikes! That’s probably not what you were hoping for. The downside of converting our data to a data frame is that we lost our nice color scheme for the categories. We will need to recreate that color map as a data table. We’ve looked up the colors from the original map and created a new data table that identifies the correct color for each land cover category.
lulc_colors <- data.frame(LandCover = c("Open Water", "Perennial Ice/Snow",
"Developed, Open Space", "Developed, Low Intensity",
"Developed, Medium Intensity",
"Developed, High Intensity",
"Barren Land (Rock/Sand/Clay)", "Deciduous Forest",
"Evergreen Forest", "Mixed Forest", "Shrub/Scrub",
"Grassland/Herbaceous", "Pasture/Hay", "Cultivated Crops",
"Woody Wetlands", "Emergent Herbaceous Wetlands"),
color = c("#466B9F", "#D1DEF8", "#DEC5C5", "#D99282",
"#EB0000", "#AB0000", "#B3AC9F", "#68AB5F",
"#1C6330", "#B5C58F", "#CCBA7C", "#E3E3C2",
"#CACA78", "#E8D1D1", "#DCD93D", "#A8EBEB"))
color_map <- setNames(lulc_colors$color,lulc_colors$LandCover)
We’ll save our map this time before printing it.
lulc_map <- ggplot(LULC_df) +
geom_raster(aes(x = x, y = y, fill = LandCover)) +
scale_fill_manual(values = color_map)
lulc_map
We can plot the coyote detections on to our map.
lulc_map +
geom_sf(data = coyotes, inherit.aes = FALSE, aes(size = detections)) +
ggtitle("Coyote detections") +
labs(size = "Detection frequency") +
scale_size_continuous(breaks=seq(100, 500, by=100)) +
xlab("UTM E") + ylab("UTM N")
Oh no! What happened? If you squint, you can see a tiny map of Seattle in the upper right corner. This happened because our coyote layer is still in WGS 84, but the land use map is in UTM Zone 10N. The important point here is that if your projections don’t line up, you may still get a result, but it won’t be at all what you expect. Let’s re-project the coyote data then recreate this map so it looks nice. Try those two steps yourself before checking the answer in the space below (web) or the next page (PDF).
coyotes_utm <- st_transform(coyotes, crs = "EPSG:26910")
lulc_map +
geom_sf(data = coyotes_utm, inherit.aes = FALSE, aes(size = detections)) +
ggtitle("Coyote detections") +
labs(size = "Detection frequency") +
scale_size_continuous(breaks=seq(100, 500, by=100)) +
xlab("UTM E") + ylab("UTM N")
This section will pull everything together. You will use some of the layers from the previous parts of the tutorial to extract environmental covariates then build models using your detection data. This case study shows how integrating your spatial analysis into R allows you to perform a complete analysis in one setting instead of bouncing back and forth between GIS and R.
In this section you will do the following:
For this section we will use the points dataset representing the camera trap sites, and the geospatial data in the vector and raster section, to extract the geospatial data relevant to each camera trap site. Some of the steps in the next section are redundant to work you did earlier in the tutorial, but we wanted to put everything in this case study that you would need to perform the analysis from start to finish but with fewer explanatory details.
We will first generate a buffer polygon for each camera site, with
radius 500m from the camera trap location. For each camera trap buffer,
we will measure road density, total forest and grass surface area, and
mean vegetation and built up density. We will use
sf for all vector data and terra for
rasters.
When we make the spatial data layer, we need to create it with EPSG:4326 because the data are in latitude / longitude. We then transform it to the correct CRS. By chaining together the commands with pipes, we can create our layers in one step without creating intermediate data frames.
# Import all captures and set CRS
captures <- read.csv("data/captures.csv") %>%
st_as_sf(coords = c("longitude","latitude"),
crs = 4326) %>%
st_transform(crs="EPSG:26910")
Get a spatial data frame of all coyote captures that includes locations that have 0 captures.
# Create a data frame with all locations
all_locations <- select(captures,locationid) %>%
distinct()
# Filter captures for coyotes and summarize by location
# Converting it to a data frame drops the spatial component from one of the data sets
coyote_temp <- as.data.frame(filter(captures, speciesname == "Canis latrans")) %>%
group_by(locationid) %>%
summarize(detections = n(), .groups = 'drop')
# Left join with all locations to include the locations with 0 captures
coyotes <- all_locations %>%
left_join(coyote_temp, by = "locationid") %>%
replace_na(list(detections = 0))
roads <- st_read("maps/roads/tl_2019_53_prisecroads.shp") %>%
st_crop(xmin = -122.5, ymin = 47.35,
xmax = -121.9, ymax = 47.8) %>%
st_transform(crs="EPSG:26910")
## Reading layer `tl_2019_53_prisecroads' from data source
## `/Users/jordanma/work/research/analysis/sucp/uwin_workshop/spatial_analysis_workshop_UWIN/maps/roads/tl_2019_53_prisecroads.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 2912 features and 4 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -124.6384 ymin: 45.55945 xmax: -117.0359 ymax: 49.00241
## Geodetic CRS: NAD83
forest <- st_read("maps/Seattle_forest.shp") %>% # Shapefile already cropped
st_transform(crs="EPSG:26910")
## Reading layer `Seattle_forest' from data source
## `/Users/jordanma/work/research/analysis/sucp/uwin_workshop/spatial_analysis_workshop_UWIN/maps/Seattle_forest.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 3388 features and 44 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -122.5 ymin: 47.35 xmax: -121.9 ymax: 47.80039
## Geodetic CRS: WGS 84
grass <- st_read("maps/Seattle_grass.shp") %>% # Shapefile alrady cropped
st_transform(crs="EPSG:26910")
## Reading layer `Seattle_grass' from data source
## `/Users/jordanma/work/research/analysis/sucp/uwin_workshop/spatial_analysis_workshop_UWIN/maps/Seattle_grass.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 12785 features and 44 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -122.5 ymin: 47.35013 xmax: -121.9022 ymax: 47.80039
## Geodetic CRS: WGS 84
NDVI <- rast("maps/NDVI_WA.tif") %>%
project("EPSG:26910") %>%
crop(forest)
BUILT <- rast("maps/BUILT_WA.tif") %>%
project("EPSG:26910") %>%
crop(forest)
LULC <- rast("maps/LULC_WA.tif") %>%
project("EPSG:26910") %>%
crop(forest)
## |---------|---------|---------|---------|=========================================
# Generate buffer areas around camera traps
coyotes_buff <- st_buffer(coyotes, dist=500)
# Intersect buffers with road segments
car.int <- st_intersection(coyotes_buff, roads)
car.int$length.int <- st_length(car.int)
# Sum all road segment lengths in km within each buffer
car.sum <- aggregate(length.int ~ locationid,
data = car.int,
FUN = sum) # total length of roads in buffer
car.sum<- car.sum %>% mutate(car_length_km = round(as.numeric(length.int)/1000)) #convert to km
# Merge road lengths into coyote data
# Convert NA to 0
coyotes_buff <- left_join(coyotes_buff, car.sum, by = "locationid") %>%
replace_na(list(car_length_km = 0))
gra.int <- st_intersection(grass, coyotes_buff)
gra.int <- gra.int %>% mutate(grass_area_km2 = as.numeric(st_area(gra.int))/1000^2)
grass_area <- aggregate(grass_area_km2 ~ locationid, data = gra.int, FUN = "sum")
coyotes_buff <- left_join(coyotes_buff, grass_area, by = "locationid") %>%
replace_na(list(grass_area_km2 = 0))
for.int <- st_intersection(forest, coyotes_buff)
for.int <- for.int %>% mutate(forest_area_km2 = as.numeric(st_area(for.int))/1000^2)
forest_area <- aggregate(forest_area_km2 ~ locationid, data = for.int, FUN = "sum")
coyotes_buff <- left_join(coyotes_buff, forest_area, by = "locationid") %>%
replace_na(list(forest_area_km2 = 0))
It is important to change all NDVI values <0 to NA, as these will
bias our estimates for buffer areas near the shore. We also need to
create a temporary SpatVector from coyotes_buff to
terra can work with both sets of data. The results of the
extract() function are vectors in the same order as the
locationid column, so we will make a new data frame with
these results then join it with coyotes_buff.
coyotes_buff_sv <- vect(coyotes_buff)
NDVI <- ifel(NDVI<0, NA, NDVI)
NDVI_mean <- extract(NDVI, coyotes_buff_sv, fun='mean', na.rm=TRUE)[,2]
BUILT <- ifel(BUILT<0, NA, BUILT)
BUILT_mean <- extract(BUILT, coyotes_buff_sv, fun='mean', na.rm=TRUE)[,2]
raster_means <- cbind.data.frame(locationid = coyotes_buff$locationid,
NDVI_mean, BUILT_mean)
# Don't need to replace NAs because there aren't any in raster_means
coyotes_buff <- left_join(coyotes_buff, raster_means, by = "locationid")
The table coyotes_buff now has everything we need to
eventually run stats. It has environmental covariates for an occupancy
model. Because occupancy modeling is beyond the scope of this tutorial,
we’ll demonstrate with a couple simple linear models comparing
environmental variables to the number of detections. Even though
coyotes_buff is an sf object, we can run
statistics directly on it like any other data frame in R.
summary(glm(detections ~ BUILT_mean + forest_area_km2, data=coyotes_buff))
##
## Call:
## glm(formula = detections ~ BUILT_mean + forest_area_km2, data = coyotes_buff)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.19738 65.35269 0.095 0.925
## BUILT_mean 0.02778 0.03152 0.881 0.389
## forest_area_km2 0.95092 126.74844 0.008 0.994
##
## (Dispersion parameter for gaussian family taken to be 14553.74)
##
## Null deviance: 307915 on 22 degrees of freedom
## Residual deviance: 291075 on 20 degrees of freedom
## AIC: 290.53
##
## Number of Fisher Scoring iterations: 2
summary(glm(detections ~ BUILT_mean, data=coyotes_buff))
##
## Call:
## glm(formula = detections ~ BUILT_mean, data = coyotes_buff)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.57471 40.72508 0.161 0.873
## BUILT_mean 0.02764 0.02508 1.102 0.283
##
## (Dispersion parameter for gaussian family taken to be 13860.75)
##
## Null deviance: 307915 on 22 degrees of freedom
## Residual deviance: 291076 on 21 degrees of freedom
## AIC: 288.53
##
## Number of Fisher Scoring iterations: 2